Name ambiguity is common in academic digital libraries, such as multiple authors having the same name. This creates challenges for academic data management and analysis, thus name disambiguation becomes necessary. The procedure of name disambiguation is to divide publications with the same name into different groups, each group belonging to a unique author. A large amount of attribute information in publications makes traditional methods fall into the quagmire of feature selection. These methods always select attributes artificially and equally, which usually causes a negative impact on accuracy. The proposed method is mainly based on representation learning for heterogeneous networks and clustering and exploits the self-attention technology to solve the problem. The presentation of publications is a synthesis of structural and semantic representations. The structural representation is obtained by meta-path-based sampling and a skip-gram-based embedding method, and meta-path level attention is introduced to automatically learn the weight of each feature. The semantic representation is generated using NLP tools. Our proposal performs better in terms of name disambiguation accuracy compared with baselines and the ablation experiments demonstrate the improvement by feature selection and the meta-path level attention in our method. The experimental results show the superiority of our new method for capturing the most attributes from publications and reducing the impact of redundant information.
translated by 谷歌翻译
The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.
translated by 谷歌翻译
在各种图像处理和计算机视觉任务中经常遇到颜色图像Denoising。一种传统的策略是将RGB图像转换为较小相关的颜色空间,并分别将新空间的每个通道定义。但是,这种策略无法完全利用渠道之间的相关信息,并且不足以获得令人满意的结果。为了解决这个问题,本文提出了一个新的多通道优化模型,用于在核定标准下减去Frobenius规范最小化框架下的颜色图像Deno。具体而言,基于块匹配,将颜色图像分解为重叠的RGB补丁。对于每个补丁,我们堆叠其相似的邻居以形成相应的补丁矩阵。提出的模型是在补丁矩阵上执行的,以恢复其无噪声版本。在恢复过程中,a)引入权重矩阵以充分利用通道之间的噪声差; b)单数值是自适应缩小的,而无需分配权重。有了他们,提议的模型可以在保持简单的同时取得有希望的结果。为了解决提出的模型,基于乘数框架的交替方向方法构建了准确有效的算法。每个更新步骤的解决方案可以在封闭式中分析表达。严格的理论分析证明了所提出的算法产生的解决方案序列会收敛到其各自的固定点。合成和真实噪声数据集的实验结果证明了所提出的模型优于最先进的模型。
translated by 谷歌翻译
作为一种常见的安全工具,已广泛应用可见的水印来保护数字图像的版权。但是,最近的作品表明,可见的水印可以通过DNN删除而不会损坏其宿主图像。这样的水印驱动技术对图像的所有权构成了巨大威胁。受到DNN在对抗扰动方面的脆弱性的启发,我们提出了一种新颖的防御机制,可以永久地通过对抗机器学习。从对手的角度来看,可以将盲水水印网络作为我们的目标模型提出。然后,我们实际上优化了对宿主图像上不可察觉的对抗扰动,以主动攻击水印网络,称为水印疫苗。具体而言,提出了两种类型的疫苗。破坏水印疫苗(DWV)在通过水印拆除网络后,诱导了与水印一起破坏宿主图像。相比之下,不可行的水印疫苗(IWV)以另一种方式试图保持水印不清除且仍然明显。广泛的实验证明了我们的DWV/IWV在防止水印去除方面的有效性,尤其是在各种水印去除网络上。
translated by 谷歌翻译
这项工作旨在使用带有动作查询的编码器框架(类似于DETR)来推进时间动作检测(TAD),该框架在对象检测中表现出了巨大的成功。但是,如果直接应用于TAD,该框架遇到了几个问题:解码器中争论之间关系的探索不足,由于培训样本数量有限,分类培训不足以及推断时不可靠的分类得分。为此,我们首先提出了解码器中的关系注意机制,该机制根据其关系来指导查询之间的注意力。此外,我们提出了两项​​损失,以促进和稳定行动分类的培训。最后,我们建议在推理时预测每个动作查询的本地化质量,以区分高质量的查询。所提出的命名React的方法在Thumos14上实现了最新性能,其计算成本比以前的方法低得多。此外,还进行了广泛的消融研究,以验证每个提出的组件的有效性。该代码可在https://github.com/sssste/reaeact上获得。
translated by 谷歌翻译
链接预测旨在预测未直接可见的网络的链接,并在生物和社会系统中采用深刻的应用。尽管该任务中拓扑特征的大量利用,但尚不清楚可以在多大程度上利用特定功能来推断丢失的链接。在这里,我们表明拓扑特征的最大能力遵循一个简单的数学表达式,这与索引如何计量功能无关。因此,与一个拓扑特征相关的索引家族具有相同的性能限制。在监督预测中取消了功能的功能,与无监督的预测相比,这通常会产生更好的结果。550个结构上不同的网络在经验上验证了所见模式的普遍性,这些网络可用于特征选择和与链接预测中拓扑特征相关的网络特征分析。
translated by 谷歌翻译
分布式学习的主要重点之一是沟通效率,因为每一轮训练的模型聚集可能包括数百万到数十亿个参数。已经提出了几种模型压缩方法,例如梯度量化和稀疏方法,以提高模型聚合的通信效率。但是,对于给定梯度估计器的给定扭曲的信息理论的最低通信成本仍然未知。在本文中,我们研究了从率延伸的角度研究分布式学习中模型聚集的基本限制。通过将模型聚合作为矢量高斯首席执行官问题,我们得出了模型聚合问题的速率区域和总成绩 - 距离函数,这揭示了在特定梯度失真上限处的最小通信速率。我们还根据现实世界数据集的梯度统计数据,分析了每次迭代和总通信成本的通信成本和总通信成本。发现通过利用工人节点之间的相关性来获得沟通增益,对于符号来说是显着的,并且梯度估计器的高扭曲可以实现梯度压缩中的较低总通信成本。
translated by 谷歌翻译
Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.
translated by 谷歌翻译
最近,多模态命名实体识别(MNER)引起了很多关注。大多数工作通过从预训练对象检测器获得的区域级视觉表示使用图像信息,并依赖于注意力机制来模拟图像和文本表示之间的交互。然而,难以模拟这种交互,因为图像和文本表示分别在其各自的模态的数据上训练,并且在相同的空间中不对齐。由于文本表示在MNER中取得最重要的作用,在本文中,我们提出了{\ bf i} mage - {\ bf t} ext {\ bf a} lignments(ita)将图像特征对准到文本空间中,这样可以更好地利用基于变压器的预磨削文本嵌入的注意机制。 ITA首先在本地和全局将区域对象标记和图像级标题视为可视上下文,将其与输入文本连接为新的跨模型输入,然后将其送入预训练的文本嵌入模型。这使得预先训练的文本嵌入模型的注意模块更容易模拟两个模态之间的交互,因为它们都在文本空间中表示。 ITA进一步对齐从跨模型输入和文本输入视图预测的输出分布,使得MNER模型可以更实用和鲁棒到图像中的噪声。在我们的实验中,我们表明ITA模型可以在多模态命名实体识别数据集上实现最先进的准确性,即使没有图像信息也是如此。
translated by 谷歌翻译
级联预测旨在建模信息扩散在网络中。最先前的方法集中在挖掘来自网络的结构或顺序特征和传播路径。最近致力于将网络结构和序列特征结合起来的图形神经网络和经常性神经网络。然而,光谱或空间方法的限制限制了预测性能的提高。此外,经常性神经网络是耗时和计算昂贵的,这导致预测的效率低下。在这里,我们提出了一种考虑个人简档,结构特征和序列信息的新方法CCASGNN。该方法利用GAT和GCN的协作框架以及将位置编码堆叠到图形神经网络层中,这与所有现有的GAT神经网络层不同,并表明了良好的性能。与最先进的方法相比,在两个真实数据集上进行的实验证实,我们的方法显着提高了预测准确性。更重要的是,消融研究调查了我们在我们的方法中的每个组分的贡献。
translated by 谷歌翻译